Search CORE

74 research outputs found

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling

Author: Berend Gábor
Publication venue
Publication date: 21/12/2016
Field of study

In this paper we propose and carefully evaluate a sequence labeling framework which solely utilizes sparse indicator features derived from dense distributed word representations. The proposed model obtains (near) state-of-the art performance for both part-of-speech tagging and named entity recognition for a variety of languages. Our model relies only on a few thousand sparse coding-derived features, without applying any modification of the word representations employed for the different tasks. The proposed model has favorable generalization properties as it retains over 89.8% of its average POS tagging accuracy when trained at 1.2% of the total available training data, i.e.~150 sentences per language

arXiv.org e-Print Archive

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Opinion Expression Mining by Exploiting Keyphrase Extraction

Author: Berend Gábor
Publication venue: Asian Federation of Natural Language Processing
Publication date: 01/01/2011
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Analysing the semantic content of static Hungarian embedding spaces

Author: Berend Gábor
Ficsor Tamás
Publication venue
Publication date: 01/01/2021
Field of study

Word embeddings can encode semantic features and have achieved many recent successes in solving NLP tasks. Although word embeddings have high success on several downstream tasks, there is no trivial approach to extract lexical information from them. We propose a transformation that amplifies desired semantic features in the basis of the embedding space. We generate these semantic features by a distant supervised approach, to make them applicable for Hungarian embedding spaces. We propose the Hellinger distance in order to perform a transformation to an interpretable embedding space. Furthermore, we extend our research to sparse word representations as well, since sparse representations are considered to be highly interpretable

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

University of Szeged

Nyelvspecifikus transzformer modellek közötti megfeleltetéssel történő zero-shot jelentésegyértelműsítés

Author: Berend Gábor
Publication venue
Publication date: 01/01/2022
Field of study

Cikkünkben egy nyelvspecifikus transzformer modellekre támaszkodó, a jelentésegyértelműsítési feladatot zero-shot módon elvégző eljárást mutatunk be. A javasolt módszer a nyelvközi tudástranszfert a tanítóadatokkal rendelkező forrás-, valamint a tanítóadatokat nélkülöző célnyelv feldolgozására dedikáltan létrehozott egynyelvű előtanított modellekre épít. A nyelvek közötti kapcsolatot az egynyelvű transzformer modellek rejtett rétegei közötti megfeleltetést szolgáló leképezés tanulásával érjük el. Eredményeink megmutatják, hogy az ilyen módon létrehozott, kizárólag angol nyelvű jelentésegyértelműsített szövegeken tanuló modellek hatékonysága szignifikánsan javítható a többnyelvű maszkolt nyelvi modell alkalmazásához képest

University of Szeged

Utilizing word embeddings for part-of-speech tagging

Author: Berend Gábor
Publication venue
Publication date: 01/01/2016
Field of study

In this paper, we illustrate the power of distributed word representations for the part-of-speech tagging of Hungarian texts. We trained CRF models for POS-tagging that made use of features derived from the sparse coding of the word embeddings of Hungarian words as signals. We show that relying on such a representation, it is possible to avoid the creation of language specific features for achieving reliable performance. We evaluated our models on all the subsections of the Szeged Treebank both using MSD and universal morphology tag sets. Furthermore, we also report results for inter-subcorpora experiments

University of Szeged

Regularization of word embeddings for multi-word expression identification

Author: Berend Gábor
Publication venue
Publication date: 01/01/2018
Field of study

In this paper we compare the effects of applying various state-of-the-art word representation strategies in the task of multi-word expression (MWE) identification. In particular, we analyze the strengths and weaknesses of the usage of `1-regularized sparse word embeddings for identifying MWEs. Our earlier study demonstrated the effectiveness of regularized word embeddings in other sequence labeling tasks, i.e. part-of-speech tagging and named entity recognition, but it has not yet been rigorously evaluated for the identification of MWEs yet

University of Szeged

Látens szemantikus eloszlások használata a nyelvi modellek előtanítása során

Author: Berend Gábor
Publication venue: Szegedi Tudományegyetem
Publication date: 01/01/2023
Field of study

Cikkünk egy olyan variánsát mutatja be a nyelvi modellek előtanításának, amely során a maszkolás tárgyául nem a véletlenszerűen kiválasztott tokenek rekonstruálását, hanem azok szemantikus kategóriájának megállapítását tűzzük ki célul. A javasolt módon létrehozott modelljeink finomhangolását változatos benchmarkokon elvégezve azt találjuk, hogy azok szignifikánsan jobb eredmény elérésére képesek hagyományos társaikhoz képest

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

University of Szeged